library(readr)
library(mosaic)
library(tidyverse)
library(pander)
library(car)
Math425PastGrades <- read_csv("~/Downloads/Math425PastGrades.csv")
Math425PastGrades[Math425PastGrades == "N"] <- "0"
Math425PastGrades[Math425PastGrades == "Y"] <- "1"
Math425PastGrades[Math425PastGrades == "M"] <- "1"
Math425PastGrades[Math425PastGrades == "F"] <- "2"
Math425PastGrades[is.na(Math425PastGrades)] <- 0
Math425PastGrades$Section <- NULL
Math425PastGrades$Gender <- as.numeric(as.character(Math425PastGrades$Gender))
Math425PastGrades$AttendedAlmostAlways <- as.numeric(as.character(Math425PastGrades$AttendedAlmostAlways))
Math425PastGrades$SpentTimeInOfficeHours <- as.numeric(as.character(Math425PastGrades$SpentTimeInOfficeHours))
Math425PastGrades$ClassActivitiesCompletedPerfectly <- as.numeric(as.character(Math425PastGrades$ClassActivitiesCompletedPerfectly))
Math425PastGrades$SkillsQuizzesCompletedPerfectly <- as.numeric(as.character(Math425PastGrades$SkillsQuizzesCompletedPerfectly))
Math425PastGrades <- Math425PastGrades %>%
  mutate(Analysis = Analysis_CarPrices + Analysis_PredWeather + Theory_Residuals + Theory_SamplingDists,
         Assignments = ifelse(Analysis >= 52, 1, 0))
lm7 <- lm(FinalExam ~ Midterm + I(Midterm^2) + SkillsQuizzesCompletedPerfectly + Assignments, data=Math425PastGrades)

Problem 1 Consider the ?Math425pastgrads data set. The goal of this question is to find a “best model” for predicting which is best, keeping the midterm score and averaging it with the ifnal score, or dropping the midterm score and only going for the final score. I need to decide if I want to rely solely on my final score or not.

Midterm Analysis Decision

I have decided to drop my midterm score.

I didn’t do as well as I would have liked on the midterm. It wasn’t a lack of knowledge, but there were other factors that went into my 24/100. As I created this prediction for my final score, I used my midterm score, and looked at the interaction it had when Skills Quizzes were completed perfectly (1 being completed 0 being incomplete) and the Assignments all completed to at least a score of 13 (13 and above being 1 12 and below being 0).

pander (predict(lm7, newdata = data.frame(Midterm = 24, SkillsQuizzesCompletedPerfectly= 1, Assignments=1), interval= "prediction"))
fit lwr upr
58.64 20 97.27

Based on my midterm score and how I interacted with the other lines, I have the possibility of getting a 20 on the midterm, or even a 97.27. But since it is a possibility I could get even lower than my midterm, I have decided to drop it.

lm7 <- lm(FinalExam ~ Midterm + I(Midterm^2) + SkillsQuizzesCompletedPerfectly + Assignments, data=Math425PastGrades)
pander(summary(lm7))
  Estimate Std. Error t value Pr(>|t|)
(Intercept) 40.33 11.51 3.505 0.0006375
Midterm -0.3891 0.364 -1.069 0.2872
I(Midterm^2) 0.007062 0.002905 2.431 0.0165
SkillsQuizzesCompletedPerfectly 10.52 3.347 3.144 0.002087
Assignments 13.05 3.689 3.537 0.0005717
Fitting linear model: FinalExam ~ Midterm + I(Midterm^2) + SkillsQuizzesCompletedPerfectly + Assignments
Observations Residual Std. Error \(R^2\) Adjusted \(R^2\)
128 18.43 0.4144 0.3953

Graph

plot(FinalExam ~ Midterm, data=Math425PastGrades,
     col=as.factor(SkillsQuizzesCompletedPerfectly))
palette(c("hotpink", "purple", "skyblue", "yellowgreen"))

b<- coef(lm7)
b
##                     (Intercept)                         Midterm 
##                    40.334304950                    -0.389061830 
##                    I(Midterm^2) SkillsQuizzesCompletedPerfectly 
##                     0.007062428                    10.522871619 
##                     Assignments 
##                    13.048160951
drawit <- function(SkillsQuizzesCompletedPerfectly=1, Assignments=1, i=1){
  curve(b[1] + b[2]*Midterm + b[3]*Midterm^2 + b[4]*SkillsQuizzesCompletedPerfectly + b[5]*Assignments, add=TRUE, xname="Midterm", col=palette()[i])
  }

drawit(1,0,1)
drawit(0,1,2)
drawit(1,1,3)
drawit(0,0,4)

Technical Details

Part (a)

Determine which row is most useful in explaining y

Here is me editing my data

Math425PastGrades[Math425PastGrades == "N"] <- "0"
Math425PastGrades[Math425PastGrades == "Y"] <- "1"
Math425PastGrades[Math425PastGrades == "M"] <- "1"
Math425PastGrades[Math425PastGrades == "F"] <- "2"
Math425PastGrades[is.na(Math425PastGrades)] <- 0
Math425PastGrades$Section <- NULL
Math425PastGrades$Gender <- as.numeric(as.character(Math425PastGrades$Gender))
Math425PastGrades$AttendedAlmostAlways <- as.numeric(as.character(Math425PastGrades$AttendedAlmostAlways))
Math425PastGrades$SpentTimeInOfficeHours <- as.numeric(as.character(Math425PastGrades$SpentTimeInOfficeHours))
Math425PastGrades$ClassActivitiesCompletedPerfectly <- as.numeric(as.character(Math425PastGrades$ClassActivitiesCompletedPerfectly))
Math425PastGrades$SkillsQuizzesCompletedPerfectly <- as.numeric(as.character(Math425PastGrades$SkillsQuizzesCompletedPerfectly))
pairs(Math425PastGrades, panel=panel.smooth)

pairs(Math425PastGrades, panel=panel.smooth, col=as.factor(Math425PastGrades$ClassActivitiesCompletedPerfectly))

general tools for multiple linear regression (trying to find the true model)

  1. make a pairs plot
  2. make a model based on some correlation
  3. check p value and r squared
  4. check residuals
  5. add residuals into pairs plot
  6. repeat steps 1-5, adding in significant values and taking out insignificant values

Using Assessment quizzes- I have significant P values,but not a super significant R squared

lm1 <- lm(FinalExam ~ Midterm + AssessmentQuizzes , data=Math425PastGrades)
summary(lm1)
## 
## Call:
## lm(formula = FinalExam ~ Midterm + AssessmentQuizzes, data = Math425PastGrades)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -84.198  -9.025   2.299  11.862  34.467 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       13.97213    6.55979   2.130 0.035134 *  
## Midterm            0.34801    0.09953   3.496 0.000653 ***
## AssessmentQuizzes  0.48903    0.08309   5.885 3.42e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 18.36 on 125 degrees of freedom
## Multiple R-squared:  0.4093, Adjusted R-squared:  0.3998 
## F-statistic:  43.3 on 2 and 125 DF,  p-value: 5.157e-15
par(mfrow=c(1,3))
plot(lm1, which=1:2)
plot(lm1$residuals, main="Residuals vs Order", xlab="",
     ylab="Residuals")

pairs(cbind(R = lm1$res, Fit = lm1$fit, Math425PastGrades), panel=panel.smooth) 

I’m now trying based on the analysis. I dont have significant p values, so im rejecting this option

lm2 <- lm(FinalExam ~ Midterm + Analysis_PredWeather + Analysis_CarPrices , data=Math425PastGrades)
summary(lm2)
## 
## Call:
## lm(formula = FinalExam ~ Midterm + Analysis_PredWeather + Analysis_CarPrices, 
##     data = Math425PastGrades)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -72.377  -9.108   2.719  11.586  43.183 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           1.17901    8.45242   0.139    0.889    
## Midterm               0.51781    0.09887   5.237 6.77e-07 ***
## Analysis_PredWeather  1.02986    0.77131   1.335    0.184    
## Analysis_CarPrices    0.95507    0.72139   1.324    0.188    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 19.55 on 124 degrees of freedom
## Multiple R-squared:  0.3355, Adjusted R-squared:  0.3194 
## F-statistic: 20.87 on 3 and 124 DF,  p-value: 5.205e-11
par(mfrow=c(1,3))
plot(lm2, which=1:2)
plot(lm2$residuals, main="Residuals vs Order", xlab="",
     ylab="Residuals")

pairs(cbind(R = lm2$res, Fit = lm2$fit, Math425PastGrades), panel=panel.smooth) 

Honestly, I havent done a simple quadratic yet, so im going to try that, okay, really low R squared. not doing that.

lm3 <- lm(FinalExam ~ Midterm + I(Midterm^2) , data=Math425PastGrades)
summary(lm3)
## 
## Call:
## lm(formula = FinalExam ~ Midterm + I(Midterm^2), data = Math425PastGrades)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -71.321  -7.646   4.332  12.656  44.286 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  42.109867  12.550536   3.355  0.00105 **
## Midterm      -0.282206   0.400281  -0.705  0.48211   
## I(Midterm^2)  0.007500   0.003196   2.347  0.02052 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 20.31 on 125 degrees of freedom
## Multiple R-squared:  0.2774, Adjusted R-squared:  0.2658 
## F-statistic: 23.99 on 2 and 125 DF,  p-value: 1.518e-09
par(mfrow=c(1,3))
plot(lm3, which=1:2)
plot(lm3$residuals, main="Residuals vs Order", xlab="",
     ylab="Residuals")

pairs(cbind(R = lm3$res, Fit = lm3$fit, Math425PastGrades), panel=panel.smooth) 

I’m going to try theory assignments

lm4 <- lm(FinalExam ~ Midterm + Theory_Residuals + Theory_SamplingDists , data=Math425PastGrades)
summary(lm4)
## 
## Call:
## lm(formula = FinalExam ~ Midterm + Theory_Residuals + Theory_SamplingDists, 
##     data = Math425PastGrades)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -76.954  -8.457   2.270  11.887  40.386 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          17.26046    8.16393   2.114 0.036498 *  
## Midterm               0.38935    0.09977   3.902 0.000155 ***
## Theory_Residuals     -0.16462    0.57377  -0.287 0.774661    
## Theory_SamplingDists  2.06763    0.47550   4.348 2.83e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 18.72 on 124 degrees of freedom
## Multiple R-squared:  0.3911, Adjusted R-squared:  0.3763 
## F-statistic: 26.54 on 3 and 124 DF,  p-value: 2.492e-13
par(mfrow=c(1,3))
plot(lm4, which=1:2)
plot(lm4$residuals, main="Residuals vs Order", xlab="",
     ylab="Residuals")

pairs(cbind(R = lm4$res, Fit = lm4$fit, Math425PastGrades), panel=panel.smooth) 

Office hours also has a pretty low R squared and an insignificant p value

lm5 <- lm(FinalExam ~ Midterm + SpentTimeInOfficeHours , data=Math425PastGrades)
summary(lm5)
## 
## Call:
## lm(formula = FinalExam ~ Midterm + SpentTimeInOfficeHours, data = Math425PastGrades)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -70.522  -9.103   4.838  12.478  49.670 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            17.68978    7.60253   2.327   0.0216 *  
## Midterm                 0.62895    0.09867   6.374 3.23e-09 ***
## SpentTimeInOfficeHours  0.64022    3.68476   0.174   0.8623    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 20.75 on 125 degrees of freedom
## Multiple R-squared:  0.2457, Adjusted R-squared:  0.2337 
## F-statistic: 20.36 on 2 and 125 DF,  p-value: 2.212e-08
par(mfrow=c(1,3))
plot(lm5, which=1:2)
plot(lm5$residuals, main="Residuals vs Order", xlab="",
     ylab="Residuals")

pairs(cbind(R = lm5$res, Fit = lm5$fit, Math425PastGrades), panel=panel.smooth) 

Okay I’m going to try a model using assessment quizzes, theory assignments,

lm6 <- lm(FinalExam ~ Midterm*AssessmentQuizzes , data=Math425PastGrades)
summary(lm6)
## 
## Call:
## lm(formula = FinalExam ~ Midterm * AssessmentQuizzes, data = Math425PastGrades)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -84.195  -9.029   2.290  11.855  34.465 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)                1.388e+01  1.361e+01   1.019   0.3100  
## Midterm                    3.494e-01  1.976e-01   1.768   0.0795 .
## AssessmentQuizzes          4.915e-01  3.152e-01   1.559   0.1215  
## Midterm:AssessmentQuizzes -3.253e-05  4.091e-03  -0.008   0.9937  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 18.44 on 124 degrees of freedom
## Multiple R-squared:  0.4093, Adjusted R-squared:  0.395 
## F-statistic: 28.64 on 3 and 124 DF,  p-value: 3.88e-14
par(mfrow=c(1,3))
plot(lm6, which=1:2)
plot(lm6$residuals, main="Residuals vs Order", xlab="",
     ylab="Residuals")

pairs(cbind(R = lm6$res, Fit = lm6$fit, Math425PastGrades), panel=panel.smooth) 

Math425PastGrades <- Math425PastGrades %>%
  mutate(Analysis = Analysis_CarPrices + Analysis_PredWeather + Theory_Residuals + Theory_SamplingDists,
         Assignments = ifelse(Analysis >= 52, 1, 0))
lm7 <- lm(FinalExam ~ Midterm + I(Midterm^2) + SkillsQuizzesCompletedPerfectly + Assignments, data=Math425PastGrades)
summary(lm7)
## 
## Call:
## lm(formula = FinalExam ~ Midterm + I(Midterm^2) + SkillsQuizzesCompletedPerfectly + 
##     Assignments, data = Math425PastGrades)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -67.457  -6.016   1.237  12.194  33.465 
## 
## Coefficients:
##                                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                     40.334305  11.507329   3.505 0.000638 ***
## Midterm                         -0.389062   0.363990  -1.069 0.287216    
## I(Midterm^2)                     0.007062   0.002905   2.431 0.016498 *  
## SkillsQuizzesCompletedPerfectly 10.522872   3.346596   3.144 0.002087 ** 
## Assignments                     13.048161   3.689094   3.537 0.000572 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 18.43 on 123 degrees of freedom
## Multiple R-squared:  0.4144, Adjusted R-squared:  0.3953 
## F-statistic: 21.76 on 4 and 123 DF,  p-value: 1.357e-13
ggplot(Math425PastGrades, aes(x=Midterm, y=FinalExam, color=interaction(SkillsQuizzesCompletedPerfectly, Assignments))) +
  geom_point(pch=1) +
  geom_point(aes(y=lm7$fit, cex=0.05)) +
  facet_wrap(~interaction(SkillsQuizzesCompletedPerfectly, Assignments))

lm8 <- lm(FinalExam ~ Midterm * SkillsQuizzesCompletedPerfectly * SpentTimeInOfficeHours , data=Math425PastGrades)
summary(lm8)
## 
## Call:
## lm(formula = FinalExam ~ Midterm * SkillsQuizzesCompletedPerfectly * 
##     SpentTimeInOfficeHours, data = Math425PastGrades)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -66.349  -8.551   3.441  12.083  39.186 
## 
## Coefficients:
##                                                                Estimate
## (Intercept)                                                     16.8141
## Midterm                                                          0.5897
## SkillsQuizzesCompletedPerfectly                                 -2.0864
## SpentTimeInOfficeHours                                         -22.9450
## Midterm:SkillsQuizzesCompletedPerfectly                          0.1433
## Midterm:SpentTimeInOfficeHours                                   0.2438
## SkillsQuizzesCompletedPerfectly:SpentTimeInOfficeHours          53.1088
## Midterm:SkillsQuizzesCompletedPerfectly:SpentTimeInOfficeHours  -0.6074
##                                                                Std. Error
## (Intercept)                                                       13.9726
## Midterm                                                            0.1886
## SkillsQuizzesCompletedPerfectly                                   22.0220
## SpentTimeInOfficeHours                                            19.2159
## Midterm:SkillsQuizzesCompletedPerfectly                            0.2973
## Midterm:SpentTimeInOfficeHours                                     0.2594
## SkillsQuizzesCompletedPerfectly:SpentTimeInOfficeHours            28.6339
## Midterm:SkillsQuizzesCompletedPerfectly:SpentTimeInOfficeHours     0.3845
##                                                                t value Pr(>|t|)
## (Intercept)                                                      1.203  0.23120
## Midterm                                                          3.126  0.00222
## SkillsQuizzesCompletedPerfectly                                 -0.095  0.92468
## SpentTimeInOfficeHours                                          -1.194  0.23481
## Midterm:SkillsQuizzesCompletedPerfectly                          0.482  0.63073
## Midterm:SpentTimeInOfficeHours                                   0.940  0.34912
## SkillsQuizzesCompletedPerfectly:SpentTimeInOfficeHours           1.855  0.06609
## Midterm:SkillsQuizzesCompletedPerfectly:SpentTimeInOfficeHours  -1.580  0.11681
##                                                                  
## (Intercept)                                                      
## Midterm                                                        **
## SkillsQuizzesCompletedPerfectly                                  
## SpentTimeInOfficeHours                                           
## Midterm:SkillsQuizzesCompletedPerfectly                          
## Midterm:SpentTimeInOfficeHours                                   
## SkillsQuizzesCompletedPerfectly:SpentTimeInOfficeHours         . 
## Midterm:SkillsQuizzesCompletedPerfectly:SpentTimeInOfficeHours   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 19.62 on 120 degrees of freedom
## Multiple R-squared:  0.3526, Adjusted R-squared:  0.3149 
## F-statistic: 9.339 on 7 and 120 DF,  p-value: 3.347e-09
par(mfrow=c(1,3))
plot(lm8, which=1:2)
plot(lm8$residuals, main="Residuals vs Order", xlab="",
     ylab="Residuals")

pairs(cbind(R = lm8$res, Fit = lm8$fit, Math425PastGrades), panel=panel.smooth) 

lm9 <- lm(FinalExam ~ Midterm * SkillsQuizzesCompletedPerfectly * AttendedAlmostAlways , data=Math425PastGrades)
summary(lm9)
## 
## Call:
## lm(formula = FinalExam ~ Midterm * SkillsQuizzesCompletedPerfectly * 
##     AttendedAlmostAlways, data = Math425PastGrades)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -66.377  -8.653   1.035  11.197  38.987 
## 
## Coefficients:
##                                                              Estimate
## (Intercept)                                                   -1.5466
## Midterm                                                        0.7375
## SkillsQuizzesCompletedPerfectly                               44.6072
## AttendedAlmostAlways                                          18.5601
## Midterm:SkillsQuizzesCompletedPerfectly                       -0.4062
## Midterm:AttendedAlmostAlways                                  -0.1205
## SkillsQuizzesCompletedPerfectly:AttendedAlmostAlways         -36.4691
## Midterm:SkillsQuizzesCompletedPerfectly:AttendedAlmostAlways   0.4040
##                                                              Std. Error t value
## (Intercept)                                                     13.1645  -0.117
## Midterm                                                          0.1858   3.969
## SkillsQuizzesCompletedPerfectly                                 19.3057   2.311
## AttendedAlmostAlways                                            19.3442   0.959
## Midterm:SkillsQuizzesCompletedPerfectly                          0.2884  -1.408
## Midterm:AttendedAlmostAlways                                     0.2618  -0.460
## SkillsQuizzesCompletedPerfectly:AttendedAlmostAlways            28.2556  -1.291
## Midterm:SkillsQuizzesCompletedPerfectly:AttendedAlmostAlways     0.3931   1.028
##                                                              Pr(>|t|)    
## (Intercept)                                                  0.906672    
## Midterm                                                      0.000123 ***
## SkillsQuizzesCompletedPerfectly                              0.022564 *  
## AttendedAlmostAlways                                         0.339253    
## Midterm:SkillsQuizzesCompletedPerfectly                      0.161680    
## Midterm:AttendedAlmostAlways                                 0.646304    
## SkillsQuizzesCompletedPerfectly:AttendedAlmostAlways         0.199294    
## Midterm:SkillsQuizzesCompletedPerfectly:AttendedAlmostAlways 0.306193    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 19.56 on 120 degrees of freedom
## Multiple R-squared:  0.3566, Adjusted R-squared:  0.319 
## F-statistic: 9.499 on 7 and 120 DF,  p-value: 2.39e-09
par(mfrow=c(1,3))
plot(lm9, which=1:2)
plot(lm9$residuals, main="Residuals vs Order", xlab="",
     ylab="Residuals")

pairs(cbind(R = lm9$res, Fit = lm9$fit, Math425PastGrades), panel=panel.smooth)